SQL Agent
SQL Agent
An SQL Agent combines the reasoning capabilities of LLMs with structured database access. Instead of writing SQL manually, users ask questions in natural language and the agent autonomously discovers the schema, constructs queries, and returns answers.
Why SQL Agents Matter
Traditional database access requires SQL knowledge. An SQL Agent bridges this gap:
graph LR
subgraph Traditional
A[User] --> B[Write SQL]
B --> C[Execute Query]
C --> D[Interpret Results]
end
subgraph SQL Agent
E[User Question] --> F[Agent Reasons]
F --> G[Auto-generates SQL]
G --> H[Natural Language Answer]
end
SQL Agents provide:
- Natural Language Interface: Ask questions without knowing SQL
- Schema Discovery: Agent explores database structure automatically
- Query Generation: Constructs correct SQL based on user intent
- Safe Execution: Read-only queries prevent accidental modifications
Architecture
synalinks.SQLAgent wires a FunctionCallingAgent with three pre-built
tools bound to a KnowledgeBase:
flowchart TD
A[User Question] --> B[SQLAgent]
B --> C{Select Tool}
C -->|Discover structure| D[get_database_schema]
C -->|View sample data| E[get_table_sample]
C -->|Execute query| F[run_sql_query]
D --> G[Schema Info]
E --> H[Sample Rows]
F --> I[Query Results]
G --> B
H --> B
I --> B
B --> J[Natural Language Answer + SQL]
The agent follows an autonomous loop: it first discovers the schema (or reads the tables from the system instructions), optionally samples data to understand the format, then constructs and executes SQL queries until it has enough information to answer.
Available Tools
| Tool | Description |
|---|---|
get_database_schema |
Returns all tables and their columns with types |
get_table_sample |
Fetches sample rows with LIMIT/OFFSET pagination |
run_sql_query |
Executes read-only SELECT queries |
Result sets are rendered as CSV by default so the LM spends fewer
input tokens reading tabular data. Set output_format="json" on the
SQLAgent if you want list-of-dicts instead.
Defining Input and Output Models
import synalinks
class Query(synalinks.DataModel):
"""A natural language query about the database."""
query: str = synalinks.Field(
description="The user's question about the data in natural language"
)
class SQLResult(synalinks.DataModel):
"""The result of the SQL agent's analysis."""
answer: str = synalinks.Field(
description="A clear, natural language answer to the user's question"
)
sql_query: str = synalinks.Field(
description="The SQL query that was executed to get the answer"
)
The Query model captures the user's natural language question. The
SQLResult model structures the output to include both a human-readable
answer and the SQL used, providing transparency into the agent's reasoning.
Building the Agent
SQLAgent takes a KnowledgeBase + LanguageModel + output data model
and handles tool wiring internally — no manual Tool definitions needed.
# Create Knowledge Base with your data models
kb = synalinks.KnowledgeBase(
uri="duckdb://my_database.db",
data_models=[Customer, Product, SalesOrder],
)
# Configure language model
lm = synalinks.LanguageModel(model="gemini/gemini-3.1-flash-lite-preview")
# Build the agent via the Functional API
inputs = synalinks.Input(data_model=Query)
outputs = await synalinks.SQLAgent(
knowledge_base=kb,
language_model=lm,
data_model=SQLResult,
max_iterations=10,
)(inputs)
sql_agent = synalinks.Program(
inputs=inputs,
outputs=outputs,
name="sql_agent",
)
Safety Considerations
Safety is the Knowledge Base's responsibility, not the agent's. The
DuckDB adapter treats read_only=True (which run_sql_query passes by
default) as the whole safety contract, enforced at two layers — both
using DuckDB's own machinery, so there are no hand-rolled keyword
blocklists (which leak false negatives via comments, string literals,
casing, or stacked statements like SELECT 1; DROP TABLE x):
- Parser check (blocks writes). Each statement is parsed with
DuckDB's own parser and rejected unless it's a
SELECT. This catches multi-statement injection,COPY (SELECT ...) TO 'file'exfiltration,ATTACH,EXPORT, and every other side-effecting statement. - Sandbox (blocks external I/O). The persistent connection has
enable_external_access=falseapplied at construction time, soSELECTtable functions that touch the host filesystem or network —read_csv,read_parquet,read_json,read_blob,read_text,glob, the httpfs/S3 variants — return a permission error.
Example Usage
# Ask a natural language question
result = await sql_agent(Query(query="Who are the top 3 customers by orders?"))
print(result["answer"])
# Output: "The top 3 customers are: Carlos Garcia ($1539.96),
# Alice Johnson ($1489.96), and Diana Chen ($379.94)"
print(result["sql_query"])
# Output: "SELECT c.name, SUM(o.total_amount) as total
# FROM Customer c JOIN SalesOrder o ON c.id = o.customer_id
# GROUP BY c.name ORDER BY total DESC LIMIT 3"
The agent automatically: 1. Discovered the Customer and SalesOrder tables 2. Understood the relationship via customer_id 3. Wrote a proper JOIN with aggregation 4. Returned both the answer and the SQL for transparency
Key Takeaways
-
One module, three tools:
synalinks.SQLAgentbundles schema discovery, table sampling, and read-only SQL execution into a single ready-to-use module. No manual tool wiring. -
Token-efficient by default:
output_format="csv"minimizes LM input tokens on wide result sets. Switch to"json"only if downstream code needs typed structured data. -
Pagination:
get_table_sampleacceptslimit/offset; for arbitrary queries the LM is instructed to includeLIMITclauses. -
Safety First: The
run_sql_querytool always passesread_only=Truetokb.sql(...). Non-SELECTstatements are rejected at the parser; the sandbox blocksread_csv/ httpfs. -
Transparent Outputs: Include the generated SQL in the output schema so users can verify and learn from the agent's reasoning.
API References
Customer
Bases: DataModel
A customer in the database.
Source code in examples/16_sql_agent.py
Product
Bases: DataModel
A product in the database.
Source code in examples/16_sql_agent.py
Query
Bases: DataModel
A natural language query about the database.
Source code in examples/16_sql_agent.py
SQLResult
Bases: DataModel
The result of the SQL agent's analysis.
Source code in examples/16_sql_agent.py
SalesOrder
Bases: DataModel
An order in the database.
Source code in examples/16_sql_agent.py
main()
async
Demonstrate the SQL agent with natural language queries.
Source code in examples/16_sql_agent.py
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 | |
populate_knowledge_base(kb)
async
Populate the knowledge base with sample data.
Source code in examples/16_sql_agent.py
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | |
Source
import asyncio
import os
from dotenv import load_dotenv
import synalinks
# =============================================================================
# Data Models
# =============================================================================
class Customer(synalinks.DataModel):
"""A customer in the database."""
id: str = synalinks.Field(description="Customer ID")
name: str = synalinks.Field(description="Customer name")
email: str = synalinks.Field(description="Customer email")
country: str = synalinks.Field(description="Customer country")
class Product(synalinks.DataModel):
"""A product in the database."""
id: str = synalinks.Field(description="Product ID")
name: str = synalinks.Field(description="Product name")
category: str = synalinks.Field(description="Product category")
price: float = synalinks.Field(description="Product price")
stock: int = synalinks.Field(description="Stock quantity")
class SalesOrder(synalinks.DataModel):
"""An order in the database."""
id: str = synalinks.Field(description="Order ID")
customer_id: str = synalinks.Field(description="Customer ID")
product_id: str = synalinks.Field(description="Product ID")
quantity: int = synalinks.Field(description="Quantity ordered")
total_amount: float = synalinks.Field(description="Total order amount")
status: str = synalinks.Field(description="Order status")
class Query(synalinks.DataModel):
"""A natural language query about the database."""
query: str = synalinks.Field(
description="The user's question about the data in natural language"
)
class SQLResult(synalinks.DataModel):
"""The result of the SQL agent's analysis."""
answer: str = synalinks.Field(
description="A clear, natural language answer to the user's question"
)
sql_query: str = synalinks.Field(
description="The SQL query that was executed to get the answer"
)
# =============================================================================
# Database Setup
# =============================================================================
async def populate_knowledge_base(kb):
"""Populate the knowledge base with sample data."""
customers = [
Customer(
id="C001", name="Alice Johnson", email="alice@example.com", country="USA"
),
Customer(id="C002", name="Bob Smith", email="bob@example.com", country="UK"),
Customer(
id="C003",
name="Carlos Garcia",
email="carlos@example.com",
country="Spain",
),
Customer(
id="C004", name="Diana Chen", email="diana@example.com", country="China"
),
Customer(
id="C005", name="Emma Wilson", email="emma@example.com", country="Canada"
),
]
products = [
Product(
id="P001",
name="Laptop Pro",
category="Electronics",
price=1299.99,
stock=50,
),
Product(
id="P002",
name="Wireless Mouse",
category="Electronics",
price=49.99,
stock=200,
),
Product(
id="P003",
name="Mechanical Keyboard",
category="Electronics",
price=149.99,
stock=100,
),
Product(
id="P004",
name="USB-C Hub",
category="Accessories",
price=79.99,
stock=150,
),
Product(
id="P005",
name="Monitor Stand",
category="Accessories",
price=89.99,
stock=75,
),
Product(
id="P006",
name="Webcam HD",
category="Electronics",
price=129.99,
stock=80,
),
Product(id="P007", name="Desk Lamp", category="Office", price=45.99, stock=120),
Product(
id="P008",
name="Notebook Set",
category="Office",
price=12.99,
stock=500,
),
]
orders = [
SalesOrder(
id="O001",
customer_id="C001",
product_id="P001",
quantity=1,
total_amount=1299.99,
status="completed",
),
SalesOrder(
id="O002",
customer_id="C001",
product_id="P002",
quantity=2,
total_amount=99.98,
status="completed",
),
SalesOrder(
id="O003",
customer_id="C002",
product_id="P003",
quantity=1,
total_amount=149.99,
status="completed",
),
SalesOrder(
id="O004",
customer_id="C003",
product_id="P001",
quantity=1,
total_amount=1299.99,
status="shipped",
),
SalesOrder(
id="O005",
customer_id="C003",
product_id="P004",
quantity=3,
total_amount=239.97,
status="shipped",
),
SalesOrder(
id="O006",
customer_id="C004",
product_id="P002",
quantity=5,
total_amount=249.95,
status="pending",
),
SalesOrder(
id="O007",
customer_id="C004",
product_id="P006",
quantity=1,
total_amount=129.99,
status="completed",
),
SalesOrder(
id="O008",
customer_id="C005",
product_id="P007",
quantity=2,
total_amount=91.98,
status="completed",
),
SalesOrder(
id="O009",
customer_id="C005",
product_id="P008",
quantity=10,
total_amount=129.90,
status="completed",
),
SalesOrder(
id="O010",
customer_id="C001",
product_id="P005",
quantity=1,
total_amount=89.99,
status="pending",
),
]
for customer in customers:
await kb.update(customer.to_json_data_model())
for product in products:
await kb.update(product.to_json_data_model())
for order in orders:
await kb.update(order.to_json_data_model())
print(f" Stored {len(customers)} customers")
print(f" Stored {len(products)} products")
print(f" Stored {len(orders)} orders")
# =============================================================================
# Main Example
# =============================================================================
async def main():
"""Demonstrate the SQL agent with natural language queries."""
load_dotenv()
synalinks.clear_session()
print("=" * 60)
print("SQL Agent with Knowledge Base")
print("=" * 60)
db_path = "guides/sql_agent.db"
if os.path.exists(db_path):
os.remove(db_path)
print("\nCreating knowledge base...")
kb = synalinks.KnowledgeBase(
uri=f"duckdb://{db_path}",
data_models=[Customer, Product, SalesOrder],
wipe_on_start=True,
name="sql_agent_kb",
)
print("Populating with sample data...")
await populate_knowledge_base(kb)
print("\nBuilding SQL agent...")
lm = synalinks.LanguageModel(model="gemini/gemini-3.1-flash-lite-preview")
inputs = synalinks.Input(data_model=Query)
outputs = await synalinks.SQLAgent(
knowledge_base=kb,
language_model=lm,
data_model=SQLResult,
max_iterations=10,
)(inputs)
sql_agent = synalinks.Program(
inputs=inputs,
outputs=outputs,
name="sql_agent",
description="An agent that answers questions about data using SQL queries",
)
sql_agent.summary()
example_queries = [
"What tables are available in the database?",
"Who are the top 3 customers by total order amount?",
"What is the most popular product category?",
"Show me all pending orders with customer names",
]
print("\n" + "=" * 60)
print("SQL Agent Demo")
print("=" * 60)
for query_text in example_queries:
print(f"\nQuestion: {query_text}")
print("-" * 40)
try:
result = await sql_agent(Query(query=query_text))
# Show trajectory (tool calls and results)
messages = result.get("messages", [])
tool_calls_count = 0
for msg in messages:
if msg.get("role") == "assistant" and msg.get("tool_calls"):
for tool_call in msg["tool_calls"]:
tool_calls_count += 1
args = tool_call.get("arguments", {})
args_str = ", ".join(f"{k}={repr(v)}" for k, v in args.items())
print(
f"Tool Call {tool_calls_count}: {tool_call['name']}({args_str})"
)
elif msg.get("role") == "tool":
content = msg.get("content", "")
# Truncate long results for readability
if isinstance(content, dict):
content = str(content)
if len(content) > 200:
content = content[:200] + "..."
print(f"Tool Result: {content}")
print(f"\nAnswer: {result['answer']}")
print(f"SQL: {result['sql_query']}")
except Exception as e:
print(f"Error: {e}")
print()
if os.path.exists(db_path):
os.remove(db_path)
if __name__ == "__main__":
asyncio.run(main())